AITopics | parameter initialization

Collaborating Authors

parameter initialization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Benchmarking VQE Configurations: Architectures, Initializations, and Optimizers for Silicon Ground State Energy

Boutakka, Zakaria, Innan, Nouhaila, Shafique, Muhammed, Bennai, Mohamed, Sakhi, Z.

arXiv.org Artificial IntelligenceOct-28-2025

Quantum computing presents a promising path toward precise quantum chemical simulations, particularly for systems that challenge classical methods. This work investigates the performance of the Variational Quantum Eigensolver (VQE) in estimating the ground-state energy of the silicon atom, a relatively heavy element that poses significant computational complexity. Within a hybrid quantum-classical optimization framework, we implement VQE using a range of ansatz, including Double Excitation Gates, ParticleConservingU2, UCCSD, and k-UpCCGSD, combined with various optimizers such as gradient descent, SPSA, and ADAM. The main contribution of this work lies in a systematic methodological exploration of how these configuration choices interact to influence VQE performance, establishing a structured benchmark for selecting optimal settings in quantum chemical simulations. Key findings show that parameter initialization plays a decisive role in the algorithm's stability, and that the combination of a chemically inspired ansatz with adaptive optimization yields superior convergence and precision compared to conventional approaches.

artificial intelligence, initialization, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.23171

Country:

North America > United States (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Energy (0.46)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

VQEzy: An Open-Source Dataset for Parameter Initialization in Variational Quantum Eigensolvers

Zhang, Chi, Zheng, Mengxin, Lou, Qian, Leung, Hui Min, Chen, Fan

arXiv.org Artificial IntelligenceSep-30-2025

Variational Quantum Eigensolvers (VQEs) are a leading class of noisy intermediate-scale quantum (NISQ) algorithms, whose performance is highly sensitive to parameter initialization. Although recent machine learning-based initialization methods have achieved state-of-the-art performance, their progress has been limited by the lack of comprehensive datasets. Existing resources are typically restricted to a single domain, contain only a few hundred instances, and lack complete coverage of Hamiltonians, ansatz circuits, and optimization trajectories. To overcome these limitations, we introduce VQEzy, the first large-scale dataset for VQE parameter initialization. VQEzy spans three major domains and seven representative tasks, comprising 12,110 instances with full VQE specifications and complete optimization trajectories. The dataset is available online, and will be continuously refined and expanded to support future research in VQE optimization.

artificial intelligence, hamiltonian, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2509.17322

Country: North America > United States > Indiana (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DiffQ: Unified Parameter Initialization for Variational Quantum Algorithms via Diffusion Models

Zhang, Chi, Zheng, Mengxin, Lou, Qian, Chen, Fan

arXiv.org Artificial IntelligenceSep-23-2025

V ariational Quantum Algorithms (VQAs) [1] have emerged as leading methods for the noisy intermediate-scale quantum (NISQ) era [2]. By combining limited quantum resources with classical optimizers, they reduce reliance on fault-tolerant devices while offering resilience to noise [1], low circuit complexity [3], and design flexibility [4]. VQAs have already demonstrated success in quantum physics, chemistry, and materials science [5-7]. Despite this promise, their scalability remains a central challenge: as system size increases, optimization landscapes flatten exponentially [8], leading to vanishing gradients and poor convergence. Parameter initialization has therefore become a critical strategy [9], reshaping the landscape to enhance trainability and mitigate suboptimal convergence. Recent deep learning-based initialization methods [10-13] define the state of the art, yet they remain task-specific, depend on limited datasets, and are typically validated in narrow settings, constraining their generalizability across diverse VQA applications.

artificial intelligence, initialization, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.17324

Country: North America > United States > Indiana (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Issues with Neural Tangent Kernel Approach to Neural Networks

Liu, Haoran, Tai, Anthony, Crandall, David J., Huang, Chunfeng

arXiv.org Machine LearningJan-18-2025

Neural tangent kernels (NTKs) have been proposed to study the behavior of trained neural networks from the perspective of Gaussian processes. An important result in this body of work is the theorem of equivalence between a trained neural network and kernel regression with the corresponding NTK. This theorem allows for an interpretation of neural networks as special cases of kernel regression. However, does this theorem of equivalence hold in practice? In this paper, we revisit the derivation of the NTK rigorously and conduct numerical experiments to evaluate this equivalence theorem. We observe that adding a layer to a neural network and the corresponding updated NTK do not yield matching changes in the predictor error. Furthermore, we observe that kernel regression with a Gaussian process kernel in the literature that does not account for neural network training produces prediction errors very close to that of kernel regression with NTKs. These observations suggest the equivalence theorem does not hold well in practice and puts into question whether neural tangent kernels adequately address the training process of neural networks.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

2501.10929

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Law (0.46)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A More Accurate Approximation of Activation Function with Few Spikes Neurons

Jeong, Dayena, Park, Jaewoo, Jo, Jeonghee, Park, Jongkil, Kim, Jaewook, Jang, Hyun Jae, Lee, Suyoun, Park, Seongsik

arXiv.org Artificial IntelligenceAug-18-2024

Objective: Recent deep neural networks (DNNs), such as diffusion models [1], have faced high computational demands. Thus, spiking neural networks (SNNs) have attracted lots of attention as energy-efficient neural networks. However, conventional spiking neurons, such as leaky integrate-and-fire neurons, cannot accurately represent complex non-linear activation functions, such as Swish [2]. To approximate activation functions with spiking neurons, few spikes (FS) neurons were proposed [3], but the approximation performance was limited due to the lack of training methods considering the neurons. Thus, we propose tendency-based parameter initialization (TBPI) to enhance the approximation of activation function with FS neurons, exploiting temporal dependencies initializing the training parameters.

activation function, approximation, neuron, (14 more...)

arXiv.org Artificial Intelligence

2409.00044

Country: Asia > South Korea > Seoul > Seoul (0.07)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Meta-Learning Neural Procedural Biases

Raymond, Christian, Chen, Qi, Xue, Bing, Zhang, Mengjie

arXiv.org Artificial IntelligenceJun-12-2024

The goal of few-shot learning is to generalize and achieve high performance on new unseen learning tasks, where each task has only a limited number of examples available. Gradient-based meta-learning attempts to address this challenging task by learning how to learn new tasks by embedding inductive biases informed by prior learning experiences into the components of the learning algorithm. In this work, we build upon prior research and propose Neural Procedural Bias Meta-Learning (NPBML), a novel framework designed to meta-learn task-adaptive procedural biases. Our approach aims to consolidate recent advancements in metalearned initializations, optimizers, and loss functions by learning them simultaneously and making them adapt to each individual task to maximize the strength of the learned inductive biases. This imbues each learning task with a unique set of procedural biases which is specifically designed and selected to attain strong learning performance in only a few gradient steps. The experimental results show that by meta-learning the procedural biases of a neural network, we can induce strong inductive biases towards a distribution of learning tasks, enabling robust learning performance across many well-established few-shot learning benchmarks. Humans have an exceptional ability to learn new tasks from only a few examples instances. We can often quickly adapt to new domains effectively by building upon and utilizing past experiences of related tasks, leveraging only a small amount of information about the target domain.

artificial intelligence, loss function, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2406.07983

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Cyclic Sparse Training: Is it Enough?

Gadhikar, Advait, Nelaturu, Sree Harsha, Burkholz, Rebekka

arXiv.org Artificial IntelligenceJun-7-2024

The success of iterative pruning methods in achieving state-of-the-art sparse networks has largely been attributed to improved mask identification and an implicit regularization induced by pruning. We challenge this hypothesis and instead posit that their repeated cyclic training schedules enable improved optimization. To verify this, we show that pruning at initialization is significantly boosted by repeated cyclic training, even outperforming standard iterative pruning methods. The dominant mechanism how this is achieved, as we conjecture, can be attributed to a better exploration of the loss landscape leading to a lower training loss. However, at high sparsity, repeated cyclic training alone is not enough for competitive performance. A strong coupling between learnt parameter initialization and mask seems to be required. Standard methods obtain this coupling via expensive pruning-training iterations, starting from a dense network. To achieve this with sparse training instead, we propose SCULPT-ing, i.e., repeated cyclic training of any sparse mask followed by a single pruning step to couple the parameters and the mask, which is able to match the performance of state-of-the-art iterative pruning methods in the high sparsity regime at reduced computational cost.

cyclic training, initialization, sparsity, (15 more...)

arXiv.org Artificial Intelligence

2406.02773

Country: Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Uncertainty Distribution Assessment of Jiles-Atherton Parameter Estimation for Inrush Current Studies

Ugarte-Valdivielso, Jone, Aizpurua, Jose I., Barrenetxea-Iñarra, Manex

arXiv.org Artificial IntelligenceMay-17-2024

Transformers are one of the key assets in AC distribution grids and renewable power integration. During transformer energization inrush currents appear, which lead to transformer degradation and can cause grid instability events. These inrush currents are a consequence of the transformer's magnetic core saturation during its connection to the grid. Transformer cores are normally modelled by the Jiles-Atherton (JA) model which contains five parameters. These parameters can be estimated by metaheuristic-based search algorithms. The parameter initialization of these algorithms plays an important role in the algorithm convergence. The most popular strategy used for JA parameter initialization is a random uniform distribution. However, techniques such as parameter initialization by Probability Density Functions (PDFs) have shown to improve accuracy over random methods. In this context, this research work presents a framework to assess the impact of different parameter initialization strategies on the performance of the JA parameter estimation for inrush current studies. Depending on available data and expert knowledge, uncertainty levels are modelled with different PDFs. Moreover, three different metaheuristic-search algorithms are employed on two different core materials and their accuracy and computational time are compared. Results show an improvement in the accuracy and computational time of the metaheuristic-based algorithms when PDF parameter initialization is used.

algorithm, initialization, parameter initialization, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TPWRD.2024.3398790

2405.11011

Country:

Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Energy > Power Industry (0.66)
Energy > Renewable (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)

Add feedback

When does MAML Work the Best? An Empirical Study on Model-Agnostic Meta-Learning in NLP Applications

Liu, Zequn, Zhang, Ruiyi, Song, Yiping, Ju, Wei, Zhang, Ming

arXiv.org Artificial IntelligenceApr-24-2024

Model-Agnostic Meta-Learning (MAML), a model-agnostic meta-learning method, is successfully employed in NLP applications including few-shot text classification and multi-domain low-resource language generation. Many impacting factors, including data quantity, similarity among tasks, and the balance between general language model and task-specific adaptation, can affect the performance of MAML in NLP, but few works have thoroughly studied them. In this paper, we conduct an empirical study to investigate these impacting factors and conclude when MAML works the best based on the experimental results.

data quantity, maml, persona description, (13 more...)

arXiv.org Artificial Intelligence

2005.117

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

On the Learning Dynamics of Attention Networks

Vashisht, Rahul, Ramaswamy, Harish G.

arXiv.org Artificial IntelligenceOct-12-2023

Attention models are typically learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention. All three paradigms are motivated by the same goal of finding two models -- a `focus' model that `selects' the right \textit{segment} of the input and a `classification' model that processes the selected segment into the target label. However, they differ significantly in the way the selected segments are aggregated, resulting in distinct dynamics and final results. We observe a unique signature of models learned using these paradigms and explain this as a consequence of the evolution of the classification model under gradient descent when the focus model is fixed. We also analyze these paradigms in a simple setting and derive closed-form expressions for the parameter trajectory under gradient flow. With the soft attention loss, the focus model improves quickly at initialization and splutters later on. On the other hand, hard attention loss behaves in the opposite fashion. Based on our observations, we propose a simple hybrid approach that combines the advantages of the different loss functions and demonstrates it on a collection of semi-synthetic and real-world datasets

attention paradigm, exp, paradigm, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA230541

2307.13421

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback